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ABSTRACT 

Educational reform efforts are currently at the top of the 
nation's agenda. Policymakers are hearing increasing calls from members of 
the public to improve standardized test scores. These reform calls are a 
response to the perceived inadequacy of science teaching in our nation. Data 
were collected from participating states regarding the status of the adoption 
of science standards and their alignment with the National Science Education 
Standards and the AAAS Benchmarks. Additionally, hierarchical information was 
collected on the match between curriculum and assessment. A coding scheme was 
designed to assess the refinement of standards and the availability and match 
of science testing. Data were collected twice from state departments of 
education over a four-year period to ascertain the match between standards, 
curriculum, and assessment. A significant difference was found in the number 
of states that now require science standards and standardized testing. 
Alignment of standards, curriculum and assessment prevailed in 2002. The fact 
that states are requiring testing in science reflects the national concern 
for educating our students in science, even if we do not believe that 
standardized testing accurately reflects effective teaching and learning. 
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Abstract 

Education reform efforts are currently at the top of the nation’s agenda. Policymakers are hearing 
increasing calls from members of the public to improve standardized test scores. These reform 
calls are a response to the perceived inadequacy of science teaching in our nation. Data were 
collected from participating states regarding the status of the adoption of science standards and 
their alignment with the National Science Education Standards and the AAAS Benchmarks. 
Additionally, hierarchical information was collected on the match between curriculum and 
assessment. A coding scheme was designed to assess the refinement of standards and the 
availability and match of science testing. Data were collected twice from state departments of 
education over a four-year period to ascertain the match between standards, curriculum, and 
assessment. A significant difference was found in the number of states that now require science 
standards and standardized testing. Alignment of standards, curriculum, and assessment 
prevailed in 2002. The feet that states are requiring testing in science reflects the national 
concern for educating our students in science, even if we do not believe that standardized testing 
accurately reflects effective teaching and learning. This project was done with funding from the 
National Science Foundation for the National Institute for Science Education (NISE) housed at 
the University of Wisconsin and the National Center for Science Education in Washington, DC. 
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A Study of the Alignment of National Standards, State Standards, 

and Science Assessment 



Introduction: Education Reform in Standards and Testing 

Reform efforts in school science have been at the top of the nation’s agenda for the past 
decade and a half. The reforms are a response to the perceived inadequacy of high school 
graduates in terms of the job market, entrance to college, student motivation for pursuing careers 
in science/engineering/technology, and producing a scientifically/technologically literate 
citizenry. 

The beginning of the push for national education standards was the formulation of the 
legislation that ultimately became Goals 2000: Educate America Act of 1994. On March 3 1 of 
that year President Clinton signed the bill into law. He is quoted as saying (in Jennings, 1 998) 
’’This is the beginning. It is the foundation. Today we can say America is serious about 
education ...[Goals 2000] sets world-class education standards for what every child at every 
American school should know in order to win when he or she becomes an adult. We have never 
done it before. We are going to do it now because of this bill” (p.l08). 

On January 8, 2003 President Bush signed into law the No Child Left Behind Act of 2002 
(NCLB). This law is intended to bring sweeping changes to the Elementary and Secondary 
Education Act (ESEA) that will impact American’s schools kindergarten-through-grade 12 
education. The act requires a stronger accountability for educational results, with an increase in 
flexibility and local control. According to the US government website for No Child Left Behind 
Act (2003) the “accountable” education system involves several critical steps, which include that 
“states create their own standards for what a child should know and learn for all grades. 
Standards must be developed in math and reading immediately. Standards must also be 
developed for science by the 2005-2006 year. With standards in place, states must test every 
student’s progress toward those standards by using tests that are aligned with the standards. ... 
Beginning in the 2007-2008 school year, science achievement must also be tested” (p.l). 

Science Standards 

The purpose of this paper is to present the status of standards and testing in science 
throughout the US. At three time intervals states the existence of science standards and science 
standards and testing alignment were researched. No attempt is made to determine the quality 
of standards or assessment instruments, but merely to examine the status of science standards 
and assessment. In order to put this study in context we need to attend to the two sets of national 
science standards. 

The U.S. (the world?) has never seen two more gigantic reform initiatives than Project 
2061 (AAAS, 1988; Rutherford & Ahlgren, 1989), a project of the American Association for the 
Advancement of Science (AAAS) supported by Carnegie and Mellon Foundations as well as the 
National Science Foundation (NSF) and Scope, Sequence, and Coordination (SS&C) (Aldridge, 
1992; Yager, 1993), a project of the National Science Teachers Association (NSTA) supported 
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by NSF, the Department of Education, and industries such as the American Petroleum Institute. 
Each of these projects has attracted support in excess of $20 million. 

Both Project 2061 and SS&C have influenced the final version of the National Science 
Education Standards (NSES) by the National Research Council (NRC, 1996), the National 
Academy of Science, with $7 million support over a four-year period involving over 3,000 
professionals. And, these finalized and distributed in December 1995— represent well 

the reform. The hope is, that the Standards will hasten the reforms and move the nation into 
meeting of our national Goals 2000: Educate America Act of 1994 (Jennings, 1998), especially 
as they pertain to science and mathematics. Of course, advances and improvement of school 
science and mathematics are intimately related to general reform of the nation’s schools. 

However, are standards alone enough to strengthen science achievement in our 
educational institutions, K-graduate levels? Results of the Third International Mathematics and 
Science Studies ranked the US near the top of grade four and in the lower middle of grades 8 and 
1 1 (Martin, 1996). These studies are the most ambitious to date examining about 45 countries at 
three grade levels, and they indicate that the US has a ways to go to be a world leader in K-12 
education. 

According to Webb (1997), “Assuring the alignment between expectations and 
assessments can strengthen an education system in important ways. Teachers give more 
credence to documents they understand are in agreement, are useful, and will serve to benefit 
their students. Teachers, already overloaded with responsibilities are better able to attend to 
expectations and assessments if they provide a consistent message and have credibility.” 

Purposes of Science Testing 

Often the rationale for assessing student learning in science is to improve science 
instruction, science programs, report information to students, parents, teachers, and 
administrators regarding the status of individuals, classes, school districts, states, and ultimately 
the nation, as in TIMSS, thus making educators accountable (Raizen, Baron, Champagne, 
Haertel, Mullis, & Oakes, 1989). Types of achievement that are generally assessed “. . .include 
knowledge of facts and concepts, science, process skills, science thinking and problem solving 
skills, skills needed to manipulate laboratory equipment, and the disposition to apply science 
knowledge in skills (Raizen, Baron, Champagne, Haertel, Mullis, & Oakes, 1991; Swain, 1985). 
According to Doran, Lawrenze, and Helgeson (1994), who wrote the seminal chapter on research 
on assessment science, “It is widely accepted that achievement tests strongly influence and direct 
curriculum development” (p. 395). It is impossible to overlook the impact of assessment. 
Shavelson, Carey, and Webb (1990) stated that “Developers of scholastic tests have become 
overseers of a very powerful instruments of education policy making: achievement tests” (p. 
692). 

Method 

Testing coordinators and curriculum specialists at state departments of education were 
contacted in summer 1998, summer 2001, and summer 2002. All three times the status of 
science standards and the match between curriculum and testing were ascertained. The questions 




4 



Draft March 2003 

for 2001 and 2002 were done by email, while the earlier questions were asked by phone. Email 
questions were more direct and elicited clearer responses. This may also be a function of states 
being more involved in science standards and testing on science content in 2001 and 2002 than 
they were in 1 998. Table 1 details information requested by email. Telephone conversations in 
1998 addressed the similar questions. Viewing state department information on their websites 
followed responses. A website with hotlinks to state departments of education was created in 
order to access their standards and testing information. The url for the website is 
bama.ua.edu/~jstock/. 

Table 1. Email Questions and State Reporting Form 



Standards Information 



• State 

• Are state standards in place? (Yes or No) 

• Are your state standards aligned with what national standards? If so, are they the 
National Science Education Standards (NRC), the Benchmarks (AAAS), or other? 

• What grade levels are covered by the standards 

Testing Information 

• Is a science-testing program in place? (Y or N) 

• Grade level(s) 

• Test(s) used 

• Writer(s) of test(s) 

• Are tests norm-referenced (NRT) or criterion-referenced (CRT)? 

• Description of how tests match standards (very closely match, closely match, etc.) 



States were assigned a coded value based upon the information provided regarding the 
nature of their science standards and science testing. Coding went as follows; 0) No response; 1) 
No curriculum standards/no assessment in science: 2) adoption of national standards/state 
standards - no science assessment; 3) standardized test (no curriculum alignment with standards); 
and 4) Assessment aligned with state/national standards. See Table 2. An example of the coding 
for the state of Alabama can be found in Table 3— Alabama has state standards that are aligned 
with national standards. The state was given a 4 on the match between standards and testing 
given the data provided on the Alabama State Department website. The state tests in science 
using the Stanford 9 at grades 4,5,7, and 9 and the High School graduation examination at 
gradel 1 and 12. Those who do not pass the exam in grade 1 1 must retake the exam in grade 12. 
The exams are both norm referenced and criterion referenced. This procedure was followed for 
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all 50 states, Washington DC, and Puerto Rico. A caveat should be noted in that states are 
constantly in the process of changing standards and assessments; therefore, things may have 
changed between the time that these data were collected and now. 



Table 2. Coding for Correspondence Between Standards and Assessment 

Code Explanation 

0 No Response 

1 No Curriculum Standards/No Assessment In Science 

2 Adoption Of National Standards/State Standards - No Science Assessment 

3 Standardized Test No Curriculum Alignment With Standards 

4 Assessment Aligned With Local/ State/National Standards 



Table 3. Example for Coding the Correspondence Between Standards and Testing. 



State 


Assessment 

Coding 


Name Of Test 
And Publisher 
(Including Form 
And Edition) 


Grade Levels 
Tested 


Type Of 
Assessment 
(CRT Or NRT) 


Alabama 


4 


Stanford 9 by 
Harcourt Brace 
Education 
Measurement, 
Alabama High 
School 

Graduation Exam 
(AHSGE) 


4, 5, 7,8- 
Stanford 9 
10, 11 - AHSGE 


NRT and CRT 



Data Analysis 

Approximate frequencies were tabulated based upon coded data for 1998, 2001, and 
2002. These data included projections for the ensuing school year. Approximate frequencies are 
used here, because it was sometimes difficult to determine the outcome based upon the response. 
This was especially true in 1998. Table 4 details the frequencies for each coded level. For 
example, in 1998 17 states had no curriculum standards in science, while in 2001 there were two 
and in 2002 there was only one. Table 5 lists the percentages of state coding from the preceding 
table based upon the 50 US states, Washington DC, and Puerto Rico. For the coded level one, 33 
percent of the states did not have any curriculum standards in science, while in 2001 there was 4 
percent and in 2002 there was only 2 percent. Some states are not represented because 
information was not obtained either by email or was not available on the state’s web site. A chi 
square analysis was done on the frequencies where p=.\9. This means that the probability that 
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these data occurred by chance alone is . 19, which is not statistically significant. However, there 
may be practical significance, given that there is a definite upward trend. 



Table 4. Coded Frequencies for State Data for Years 1998. 2001. and 2002. 



Year 


Code 




1 


2 


3 


4 


No Response 


1998 


17 


4 


18 


0 


13 


2001 


2 


10 


4 


28 


8 


2002 


1 


11 


2 


34 


4 



N = 52 (50 states plus Washington DC and Puerto Rico) 



Table 5. Percentage Comparisons of Coded Data for Years 1998. 2001 and 2002 



Year 


Code 




1 


2 


3 


4 


No Response 


1998 


33 


08 


35 


00 


.25 


2001 


04 


19 


08 


54 


.15 


2002 


02 


21 


04 


65 


.08 



N = 52 (50 states plus Washington DC and Puerto Rico) 



Standardized tests used for statewide science assessment included: Stanford 9 published 
by Harcoiirt Educational Measurement; TerraNova published by CTB/McGraw Hill; and Iowa 
Tests of Basic Skills published by Riverside Publishing. These main publishers were often the 
contractors for adaptations of tests or item generation for various state departments. States 
writing some form of their own tests are listed in Table 7. These data in particular are subject to 
interpretation based upon provided information fi-om the questionnaire and the web site. Since 
many states are in a state of flux some of this information may have changed since these data 
were collected. 
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Standardized Tests And Their Publishers 


Frequencies And States 


Stanford 9 (Harcourt Educational Measurement) 


5 (Alabama, California, Georgia, 
Oklahoma, West Virginia) 


California (Standards Tests?) (Harcourt 
Educational Measurement) 


2 (California, Minnesota) 


TerraNova (CTB/McGraw Hill) 


4 (Nevada, New Mexico, South 
Carolina, Wisconsin) 


Iowa Test of Basic Skills 

Iowa Test of Educational Development 


3 (Iowa, Minnesota, Nevada) 


Third International Math and Science Study 
(TIMSS) 


1 (Minnesota) 


National Assessment of Educational Progress 
(NAEP) 


1 (Minnesota) 


Metropolitan Achievement Test, 7*" edition 
(MAT7) 


1 (Miimesota) 



Table 7 State Written/Contracted Standardized Tests. 



Test Name 


States 


Alabama High School Graduation Exam 


Alabama 


Golden State Exam 


California 


California Standards Test 
(Harcourt Educational Measurement) 


California 


Connecticut Academic Performance Test 
(Harcourt Educational Measurement) 


Connecticut 


Delaware Student Assessment Program 
(Harcourt Educational Measurement) 


Delaware 


Kentucky Core Content Test 


Kentucky 


Maine Educational Assessment (Measured 
Progress) 


Maine 


Maryland School Performance Assessment 
Program (Publisher: CTB/McGraw Hill) 


Maryland 


Massachusetts Department of Education 


Massachusetts 


Michigan Department of Education 


Michigan 


Subject Area Testing Program 
(Harcourt Educational Measurement) 


Mississippi 


Missouri Assessment Program 
(CTB McGraw Hill) 


Missouri 


NH Teachers 
(Outside Contractor) 


New Hampshire 


New Mexico Supplement 


New Mexico Supplement 


Elementary Level Science Test 
Intermediate Level Science Test 
New York Regents Examination 


New York 


NC Department of Public Instruction 


North Carolina 


Oklahoma End of Course Exams 


Oklahoma 


Oregon Department of Education 


Oregon 


Palmetto Achievement Challenge Test (PACT) 


South Carolina 


Dakota Assessment of Content Standards 
(DACS) 

(EdVISION) 


South Dakota 


Tennessee Course Assessment Program (TCAP) 
(CTB/McGraw Hill 


Tennessee 


Texas Assessment of Knowledge and Skills 
(TAKS) 


Texas 


Vermont-Partnership for the Assessment of 
Standards-based Science (PASS) 

(WestEd & ETS) 


Vermont 
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Table 6. State Written/Contracted Standardized Tests (cont). 



Standards of Learning (SOL) 
(Harcourt Educational Measurement) 


Virginia 


Wisconsin customized TerraNova 


Wisconsin 



Conclusions 

Many more states presently test in science than in 1998, and there has been an increase in the 
match between states’ science standards and their testing programs. Since AAAS’s (1988) initial 
plea for “science for all Americans” this is a major accomplishment for the US, even though the 
percentages indicate a steady increase in the number of states with science testing aligned with 
standards. We have gone from 0% to 66% in four years; however, there is still room for 
improvement. As was stated previously, the US ranked high in the TIMSS study for only 4“* 
grade and we were only mediocre for grades 7 and 1 1 . Many states have included aspects of 
performance assessment, which moves us to a higher cognitive level of testing for the most part. 
Most states involve testing companies as contractors, but do not use a complete test out-right, 
which says something for state initiatives. Still many states rely on standardized testing from 
some vendor to supply them with items. Since the standards came first, we can make the 
assumption that they have driven achievement testing in science, but can we assume that this is 
for the better? However, we can assume that state departments of education are paying more 
attention to science education. With the current No Child Left Behind Act (2003) we can assume 
that in some way states will have a science standards and a science testing program in place by 
2007-2008 for grades 3 through 8. Hopefully, this type of study will be conduced again by 2008. 

There is a caveat in that even though we have a definite increase in science achievement, 
the TIMSS suggests that we not rest in our quest for getting science into our classrooms. The 
National Assessment for Educational Progress (NAEP) studies place the majority of students at a 
basic level of science understanding. The three NAEP levels are; 1) basic; 2) proficient; and 3) 
advanced. A very small percentage (around 1-4%) is at the advanced level of understanding. 

Is there validity and value in statewide assessment of science? Most of the tests used are 
valid in that they reflect the interests of the various states. Many of them are indeed excellent 
tests. Is there value in these assessments? This is a value judgment. There are those who do not 
believe in standardized testing at all for science, in which case the answer is “no.” However, if 
we never had a ruler or a yardstick we would not know how long things are. By the same token, 
if we did not have assessments we would not have any achievement indicators. These tests do 
measure achievement even if they are flawed in some or many ways. If we look at these tests as 
crude measures with a lot of variability, they do provide us with valuable information. 

In order to prepare today’s students for tomorrow’s world we need to reculture today’s 
schools. In order for reculturalization to happen, we need to implement successful systemic 
changes. We need to manage change of the critical masses by using continuous strategies that 
are most likely to mobilize large numbers of people in new directions (Fullan, 1996). According 
to Fullan “...systemic reform is partly a matter of redesigning the objective systems of 
interrelationships so that obvious structural faults are corrected. However, it mainly involves 
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strategies (such as networking and reculturing) that help develop and mobilize the conceptions, 
skills, and motivation in the minds and hearts of scores of educators” (p. 422). 

Standards and tests may not be the answer for achieving a high level of science literacy, 
but they do provide a type of momentum for reform. It is up to the teachers, educators, and 
researchers to make sure that they provide information and the interpretation that will enable us 
to examine science achievement. Hopefully, these tests will provide a vehicle for promoting the 
learning of science for all students. It appears that states and policy makers are moving to make 
educators more accountable in science. Aligning standards and testing is a major contribution to 
observable accountability. Whether or not we are teaching and testing “science for all 
Americans” and whether we are producing students who are literate remains to be seen. 
Assessment is only one way of seeing and our observations may be in error; however, are there 
more accurate ways of seeing? 
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